虽然编程是现代社会中最广泛适用的技能之一,但现代机器学习模型仍然无法对基本问题的解决方案。尽管重要的是,对评估代码生成令人惊讶的是,很少有效,并且难以准确地评估代码生成性能。为了满足这一挑战,我们介绍了一个用于代码生成的基准。与在更受限制的设置中的事先工作不同,我们的基准测试衡量模型采取任意自然语言规范的能力,并生成满意的Python代码。类似于公司如何评估候选软件开发人员,然后我们通过检查测试用例的生成代码来评估模型。我们的基准测试包括10,000个问题,从具有简单的单线解决方案来实现实质性算法挑战。我们在GitHub和我们的培训集上微调大型语言模型,我们发现语法错误的普遍性随着模型的提高而导致呈指数级递减。最近的模型如GPT-Neo可以通过大约20%的介绍性问题的测试用例,因此我们发现机器学习模型现在开始学习如何代码。随着自动代码生成的社会意义在未来几年增加,我们的基准可以提供跟踪进步的重要措施。
translated by 谷歌翻译
Event-based vision has been rapidly growing in recent years justified by the unique characteristics it presents such as its high temporal resolutions (~1us), high dynamic range (>120dB), and output latency of only a few microseconds. This work further explores a hybrid, multi-modal, approach for object detection and tracking that leverages state-of-the-art frame-based detectors complemented by hand-crafted event-based methods to improve the overall tracking performance with minimal computational overhead. The methods presented include event-based bounding box (BB) refinement that improves the precision of the resulting BBs, as well as a continuous event-based object detection method, to recover missed detections and generate inter-frame detections that enable a high-temporal-resolution tracking output. The advantages of these methods are quantitatively verified by an ablation study using the higher order tracking accuracy (HOTA) metric. Results show significant performance gains resembled by an improvement in the HOTA from 56.6%, using only frames, to 64.1% and 64.9%, for the event and edge-based mask configurations combined with the two methods proposed, at the baseline framerate of 24Hz. Likewise, incorporating these methods with the same configurations has improved HOTA from 52.5% to 63.1%, and from 51.3% to 60.2% at the high-temporal-resolution tracking rate of 384Hz. Finally, a validation experiment is conducted to analyze the real-world single-object tracking performance using high-speed LiDAR. Empirical evidence shows that our approaches provide significant advantages compared to using frame-based object detectors at the baseline framerate of 24Hz and higher tracking rates of up to 500Hz.
translated by 谷歌翻译
To face the dependency on fossil fuels and limit carbon emissions, fuel cells are a very promising technology and appear to be a key candidate to tackle the increase of the energy demand and promote the energy transition. To meet future needs for both transport and stationary applications, the time to market of fuel cell stacks must be drastically reduced. Here, a new concept to shorten their development time by introducing a disruptive and highefficiency data augmentation approach based on artificial intelligence is presented. Our results allow reducing the testing time before introducing a product on the market from a thousand to a few hours. The innovative concept proposed here can support engineering and research tasks during the fuel cell development process to achieve decreased development costs alongside a reduced time to market.
translated by 谷歌翻译
Online controlled experiments (A/B tests) have become the gold standard for learning the impact of new product features in technology companies. Randomization enables the inference of causality from an A/B test. The randomized assignment maps end users to experiment buckets and balances user characteristics between the groups. Therefore, experiments can attribute any outcome differences between the experiment groups to the product feature under experiment. Technology companies run A/B tests at scale -- hundreds if not thousands of A/B tests concurrently, each with millions of users. The large scale poses unique challenges to randomization. First, the randomized assignment must be fast since the experiment service receives hundreds of thousands of queries per second. Second, the variant assignments must be independent between experiments. Third, the assignment must be consistent when users revisit or an experiment enrolls more users. We present a novel assignment algorithm and statistical tests to validate the randomized assignments. Our results demonstrate that not only is this algorithm computationally fast but also satisfies the statistical requirements -- unbiased and independent.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Experience management is an emerging business area where organizations focus on understanding the feedback of customers and employees in order to improve their end-to-end experiences. This results in a unique set of machine learning problems to help understand how people feel, discover issues they care about, and find which actions need to be taken on data that are different in content and distribution from traditional NLP domains. In this paper, we present a case study of building text analysis applications that perform multiple classification tasks efficiently in 12 languages in the nascent business area of experience management. In order to scale up modern ML methods on experience data, we leverage cross lingual and multi-task modeling techniques to consolidate our models into a single deployment to avoid overhead. We also make use of model compression and model distillation to reduce overall inference latency and hardware cost to the level acceptable for business needs while maintaining model prediction quality. Our findings show that multi-task modeling improves task performance for a subset of experience management tasks in both XLM-R and mBert architectures. Among the compressed architectures we explored, we found that MiniLM achieved the best compression/performance tradeoff. Our case study demonstrates a speedup of up to 15.61x with 2.60% average task degradation (or 3.29x speedup with 1.71% degradation) and estimated savings of 44% over using the original full-size model. These results demonstrate a successful scaling up of text classification for the challenging new area of ML for experience management.
translated by 谷歌翻译
基于各种非负矩阵分解(NMF)方法为成本函数添加了新术语,以使模型适应特定任务,例如聚类或保留减少空间中的某些结构属性(例如,局部不变性)。附加的术语主要由高参数加权,以控制整体公式的平衡,以指导优化过程实现目标。结果是一种参数化的NMF方法。但是,NMF方法采用了无监督的方法来估计分解矩阵。因此,不能保证使用新的特征执行预测(例如分类)的能力。这项工作的目的是设计一个进化框架,以学习参数化NMF的超参数,并以监督的方式估算分解矩阵,以更适合分类问题。此外,我们声称,将基于NMF的算法分别应用于不同的类对,而不是将其应用于整个数据集,从而提高了矩阵分解过程的有效性。这导致训练具有不同平衡参数值的多个参数化的NMF算法。采用了交叉验证组合学习框架,并使用遗传算法来识别最佳参数值集。我们对真实和合成数据集进行的实验证明了所提出的方法的有效性。
translated by 谷歌翻译
流行模型是理解传染病的强大工具。但是,随着它们的大小和复杂性的增加,它们可以迅速在计算上棘手。建模方法的最新进展表明,替代模型可用于模拟具有高维参数空间的复杂流行模型。我们表明,深层序列到序列(SEQ2SEQ)模型可以作为具有基于序列模型参数的复杂流行病模型的准确替代物,从而有效地复制了季节性和长期传播动力学。一旦受过培训,我们的代理人可以预测场景比原始模型快几千倍,从而使其非常适合策略探索。我们证明,用博学的模拟器代替传统的流行模型有助于强大的贝叶斯推断。
translated by 谷歌翻译
旨在进行巴氏杀菌和量化特定现象的任何方法都必须包括使用强大的统计方法进行数据分析。考虑到这一点,这项研究的目的是介绍非参数非均匀数据框架中可能采用的统计方法,并检查其在自然语言处理和语言集群领域的应用。此外,本文讨论了语言数据挖掘和处理中非参数方法的许多用途。数据深度思想允许在任何维度上进行中心排序,从而导致新的非参数多元统计分析,该分析不需要任何分布假设。层次结构的概念用于历史语言分类和结构化,其目的是使用相同的前提将语言组织和聚集到亚家族中。在这方面,当前的研究提出了一种基于通过各种语言的单词类型结构产生的非参数方法的语言家族结构的新方法,然后使用MDS将其转换为笛卡尔框架。这种基于统计深度的架构允许使用基于数据深度的方法来实现强大的离群检测,这对于理解各种边界语言的分类非常有用,并允许对现有分类系统进行重新评估。其他基于深度的方法也适用于无监督和监督聚类等过程。因此,本文概述了可以在非参数框架中应用于非均匀语言分类系统的过程。
translated by 谷歌翻译
心脏磁共振(CMR)序列随着时间的推移可视化心脏功能的体素。同时,基于深度学习的可变形图像注册能够估计离散的向量字段,这些矢量字段将CMR序列的一个时间步骤扭曲为以下方式,以一种自我监督的方式。但是,尽管这些3D+T向量领域中包含的信息来源丰富,但标准化的解释具有挑战性,到目前为止,临床应用仍然有限。在这项工作中,我们展示了如何有效使用可变形的矢量场来描述心脏周期的基本动态过程,形式是派生的1D运动描述符。此外,基于收缩或放松心室的预期心血管生理特性,我们定义了一组规则,可以鉴定五个心血管阶段,包括末端 - 末端(ES)和末端diastole(ED),而无需使用标签的使用情况。我们评估了运动描述符在两个具有挑战性的多疾病, - 中心, - 扫描式短轴CMR数据集上的合理性。首先,通过报告定量措施,例如提取相的周期性框架差异。其次,通过定性地比较一般模式,当我们时间重新样本和对齐两个数据集的所有实例的运动描述符时。我们方法的ED,ES密钥阶段的平均周期框架差为0.80 \ pm {0.85} $,$ 0.69 \ pm {0.79} $,比观察者间的可变性略好($ 1.07 \ pm {0.86} $, $ 0.91 \ pm {1.6} $)和监督基线方法($ 1.18 \ pm {1.91} $,$ 1.21 \ pm {1.78} $)。代码和标签将在我们的GitHub存储库中提供。 https://github.com/cardio-ai/cmr-phase-detection
translated by 谷歌翻译